Search CORE

6 research outputs found

Outcome prediction based on microarray analysis: a critical perspective on methods

Author: Blazadonakis Michalis E
Danilatou Vasiliki
Kafetzopoulos Dimitris
Tsiknakis Manolis
Tsiliki Georgia
Zervakis Michalis
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Information extraction from microarrays has not yet been widely used in diagnostic or prognostic decision-support systems, due to the diversity of results produced by the available techniques, their instability on different data sets and the inability to relate statistical significance with biological relevance. Thus, there is an urgent need to address the statistical framework of microarray analysis and identify its drawbacks and limitations, which will enable us to thoroughly compare methodologies under the same experimental set-up and associate results with confidence intervals meaningful to clinicians. In this study we consider gene-selection algorithms with the aim to reveal inefficiencies in performance evaluation and address aspects that can reduce uncertainty in algorithmic validation. Results A computational study is performed related to the performance of several gene selection methodologies on publicly available microarray data. Three basic types of experimental scenarios are evaluated, i.e. the independent test-set and the 10-fold cross-validation (CV) using maximum and average performance measures. Feature selection methods behave differently under different validation strategies. The performance results from CV do not mach well those from the independent test-set, except for the support vector machines (SVM) and the least squares SVM methods. However, these wrapper methods achieve variable (often low) performance, whereas the hybrid methods attain consistently higher accuracies. The use of an independent test-set within CV is important for the evaluation of the predictive power of algorithms. The optimal size of the selected gene-set also appears to be dependent on the evaluation scheme. The consistency of selected genes over variation of the training-set is another aspect important in reducing uncertainty in the evaluation of the derived gene signature. In all cases the presence of outlier samples can seriously affect algorithmic performance. Conclusion Multiple parameters can influence the selection of a gene-signature and its predictive power, thus possible biases in validation methods must always be accounted for. This paper illustrates that independent test-set evaluation reduces the bias of CV, and case-specific measures reveal stability characteristics of the gene-signature over changes of the training set. Moreover, frequency measures on gene selection address the algorithmic consistency in selecting the same gene signature under different training conditions. These issues contribute to the development of an objective evaluation framework and aid the derivation of statistically consistent gene signatures that could eventually be correlated with biological relevance. The benefits of the proposed framework are supported by the evaluation results and methodological comparisons performed for several gene-selection algorithms on three publicly available datasets.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Institutional Repository of the Technical University of Crete

Outcome Prediction in Critically-Ill Patients with Venous Thromboembolism and/or Cancer Using Machine Learning Algorithms: External Validation and Comparison with Scoring Systems

Author: Christos Tzagkarakis
Despoina Antonakaki
Dimitrios Mavroidis
Sotirios Ioannidis
Stylianos Nikolakakis
Theodoros Kostoulas
Vasiliki Danilatou
Publication venue: MDPI AG
Publication date: 01/06/2022
Field of study

Intensive care unit (ICU) patients with venous thromboembolism (VTE) and/or cancer suffer from high mortality rates. Mortality prediction in the ICU has been a major medical challenge for which several scoring systems exist but lack in specificity. This study focuses on two target groups, namely patients with thrombosis or cancer. The main goal is to develop and validate interpretable machine learning (ML) models to predict early and late mortality, while exploiting all available data stored in the medical record. To this end, retrospective data from two freely accessible databases, MIMIC-III and eICU, were used. Well-established ML algorithms were implemented utilizing automated and purposely built ML frameworks for addressing class imbalance. Prediction of early mortality showed excellent performance in both disease categories, in terms of the area under the receiver operating characteristic curve (AUC–ROC): VTE-MIMIC-III 0.93, eICU 0.87, cancer-MIMIC-III 0.94. On the other hand, late mortality prediction showed lower performance, i.e., AUC–ROC: VTE 0.82, cancer 0.74–0.88. The predictive model of early mortality developed from 1651 VTE patients (MIMIC-III) ended up with a signature of 35 features and was externally validated in 2659 patients from the eICU dataset. Our model outperformed traditional scoring systems in predicting early as well as late mortality. Novel biomarkers, such as red cell distribution width, were identified.</inline-formula

Directory of Open Access Journals

PubMed Central

Performance validation of microarray analysis methods

Author: Banti Anna()
Blazadonakis Michalis E.()
Danilatou Vasiliki()
Kafetzopoulos Dimitris()
Tsiknakis Manolis()
Zervakis Michail(http://users.isc.tuc.gr/~mzervakis)
Ζερβακης Μιχαηλ(http://users.isc.tuc.gr/~mzervakis)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Summarization: Following the rapid development of gene selection methods, several comparison studies have been reported for ranking methods on various datasets. In order to reduce bias in performance measures, most studies use an evaluation scheme based on cross-validation. In this paper we focus on the methodology of evaluation itself and address methodological problems using three representative algorithms on two public datasets. More specifically, the paper discusses the need of an independent test-set to reduce bias associated with cross-validation, the use of case specific considerations for generalization, as well as other measures that reflect stability and consistency of the result. Such measures reflect the influence of the actual dataset distribution on the performance of gene selection methods.Presented on

Institutional Repository of the Technical University of Crete

Outcome prediction in critically-ill patients with venous thromboembolism and/or cancer using machine learning algorithms: external validation and comparison with scoring systems

Author: Antonakaki Despoina()
Danilatou Vasiliki()
Ioannidis Sotirios(http://users.isc.tuc.gr/~sioannidis)
Kostoulas Theodoros(https://viaf.org/viaf/67165570041037431257)
Mavroidis Dimitrios()
Nikolakakis Stylianos(http://users.isc.tuc.gr/~snikolakakis)
Tzagkarakis Christos()
Ιωαννιδης Σωτηριος(http://users.isc.tuc.gr/~sioannidis)
Νικολακακις Στυλιανος(http://users.isc.tuc.gr/~snikolakakis)
Publication venue: MDPI
Publication date
Field of study

Summarization: Intensive care unit (ICU) patients with venous thromboembolism (VTE) and/or cancer suffer from high mortality rates. Mortality prediction in the ICU has been a major medical challenge for which several scoring systems exist but lack in specificity. This study focuses on two target groups, namely patients with thrombosis or cancer. The main goal is to develop and validate interpretable machine learning (ML) models to predict early and late mortality, while exploiting all available data stored in the medical record. To this end, retrospective data from two freely accessible databases, MIMIC-III and eICU, were used. Well-established ML algorithms were implemented utilizing automated and purposely built ML frameworks for addressing class imbalance. Prediction of early mortality showed excellent performance in both disease categories, in terms of the area under the receiver operating characteristic curve (–): VTE-MIMIC-III 0.93, eICU 0.87, cancer-MIMIC-III 0.94. On the other hand, late mortality prediction showed lower performance, i.e., –: VTE 0.82, cancer 0.74–0.88. The predictive model of early mortality developed from 1651 VTE patients (MIMIC-III) ended up with a signature of 35 features and was externally validated in 2659 patients from the eICU dataset. Our model outperformed traditional scoring systems in predicting early as well as late mortality. Novel biomarkers, such as red cell distribution width, were identified.Presented on: International Journal of Molecular Science

Institutional Repository of the Technical University of Crete